Scalable Parallel Topic Models
نویسندگان
چکیده
U) The topic model is a popular probabilistic model for text and document modeling. It can be used for topic indexing, document classification, corpus summarization and information retrieval. In the past, topic models have been applied to corpora containing thousands to hundreds of thousands of documents. Now there is an increasing need to model collections with millions to billions of documents. We present a parallel algorithm for the topic model that has linear speedup and high parallel efficiency for shared-memory symmetric multiprocessors (SMPs). Using this parallel algorithm, topic model computations on an 8-processor system took 1/7 the time of the same computation on a single processor.
منابع مشابه
Scalable Inference for Logistic-Normal Topic Models
Logistic-normal topic models can effectively discover correlation structures among latent topics. However, their inference remains a challenge because of the non-conjugacy between the logistic-normal prior and multinomial topic mixing proportions. Existing algorithms either make restricting mean-field assumptions or are not scalable to large-scale applications. This paper presents a partially c...
متن کاملModels, Inference, and Implementation for Scalable Probabilistic Models of Text
Title of dissertation: Models, Inference, and Implementation for Scalable Probabilistic Models of Text Ke Zhai, Ph.D., 2014 Dept. of Computer Science Dissertation directed by: Professor Jordan Boyd-Graber iSchool, UMIACS Unsupervised probabilistic Bayesian models are powerful tools for statistical analysis, especially in the area of information retrieval, document analysis and text processing. ...
متن کاملScalable Parallel Computing on Clouds: Efficient and Scalable Architectures to Perform Pleasingly Parallel, Mapreduce and Iterative Data Intensive Computations on Cloud Environments
متن کامل
Scalable Parellel Octree Using HPX With Hilbert Curve
and D. Fey, “Hpx – a task based programming model in a global address space,” PGAS 2014: The 8th International Conference on Partitioned Global Address Space Programming Models, 2014. [2] Zahra Khatami, Hartmut Kaiser, Patricia Grubel, Bryce Adelstein-Lelbach, Adrian Serio and J. Ramanujam, “A massively parallel distributed N-Body simulation code implemented with HPX”, 28th ACM Symposium on Par...
متن کاملParallelization of Rich Models for Steganalysis of Digital Images using a CUDA-based Approach
There are several different methods to make an efficient strategy for steganalysis of digital images. A very powerful method in this area is rich model consisting of a large number of diverse sub-models in both spatial and transform domain that should be utilized. However, the extraction of a various types of features from an image is so time consuming in some steps, especially for training pha...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2007